Blackwell Optimality in Markov Decision Processes with Partial Observation by Dinah Rosenberg,

نویسندگان

  • EILON SOLAN
  • NICOLAS VIEILLE
چکیده

A Blackwell ε-optimal strategy in a Markov Decision Process is a strategy that is ε-optimal for every discount factor sufficiently close to 1. We prove the existence of Blackwell ε-optimal strategies in finite Markov Decision Processes with partial observation. 1. Introduction. A well-known result by Blackwell [3] states that, in any Markov Decision Process (MDP hereafter) with finitely many states and finitely many actions, there is a pure stationary strategy that is optimal, for every discount factor close enough to one. This strong optimality property is now referred to as Blackwell optimality. In this paper we study finite MDPs with partial observations (p.o. hereafter); that is, finite MDPs in which at the end of every stage, the decision maker receives a signal that depends randomly on the current state and on the action that has been chosen, but he observes neither the state nor his daily payoff (see, e.g., [2] and the references in [7]). MDPs with p.o. arise naturally in many contexts, such as models of machine replacement and quality control problems (see [12] and the references therein for this and additional applications), telecommunication networks (see [1] and the references therein), and intra-seasonal decisions of fishing vessel operators [9]. Here we address the problem of existence of Blackwell optimal strategies for a finite MDP with p.o. We prove that, in any such MDP and for every ε, there is a strategy that is Blackwell ε-optimal; that is, ε-optimal for every discount factor close enough to one. The strategy we construct is moreover ε-optimal in the n-stage MDP, for every n large enough. We also provide an example where there is no Blackwell zero-optimal strategy. The standard approach to an MDP with p.o. is to convert it into an auxiliary MDP with full observation and Borel state space. The conditional distribution over the state space given the available information (sequence of past signals and past actions) plays the role of the state variable in the auxiliary MDP. This approach has been developed for instance in [14], [15] and [17]. An alternative state variable is defined in [5]. One then looks for optimal stationary strategies (strategies such that the action chosen in any given stage is only a function of the belief held on

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Blackwell Optimality in Markov Decision Processes with Partial Observation

We prove the existence of Blackwell ε-optimal strategies in finite Markov Decision Processes with partial observation. ∗Laboratoire d’Analyse Geometrie et Applications Institut Galilée, Université Paris Nord, avenue Jean Baptiste Clément, 93430 Villetaneuse, France. e-mail: [email protected] †Department of Managerial Economics and Decision Sciences, Kellogg School of Management, Northw...

متن کامل

Sensitive Discount Optimality via Nested Linear Programs for Ergodic Markov Decision Processes

In this paper we discuss the sensitive discount opti-mality for Markov decision processes. The n-discount optimality is a reened selective criterion, that is a generalization of the average optimality and the bias optimality. Our approach is based on the system of nested linear programs. In the last section we provide an algorithm for the computation of the Blackwell optimal policy. The n-disco...

متن کامل

Applying Blackwell optimality: priority mean-payoff games as limits of multi-discounted games

We define and examine priority mean-payoff games — a natural extension of parity games. By adapting the notion of Blackwell optimality borrowed from the theory of Markov decision processes we show that priority mean-payoff games can be seen as a limit of special multi-discounted games.

متن کامل

ON A MARKOV GAME WITH ONE-SIDED INCOMPLETE INFORMATION By

We apply the average cost optimality equation to zero-sum Markov games, by considering a simple game with one-sided incomplete information that generalizes an example of Aumann and Maschler (1995). We determine the value and identify the optimal strategies for a range of parameters.

متن کامل

Bounded Parameter Markov Decision Processes with Average Reward Criterion

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002